4 research outputs found
Enabling Interactive Analytics of Secure Data using Cloud Kotta
Research, especially in the social sciences and humanities, is increasingly
reliant on the application of data science methods to analyze large amounts of
(often private) data. Secure data enclaves provide a solution for managing and
analyzing private data. However, such enclaves do not readily support discovery
science---a form of exploratory or interactive analysis by which researchers
execute a range of (sometimes large) analyses in an iterative and collaborative
manner. The batch computing model offered by many data enclaves is well suited
to executing large compute tasks; however it is far from ideal for day-to-day
discovery science. As researchers must submit jobs to queues and wait for
results, the high latencies inherent in queue-based, batch computing systems
hinder interactive analysis. In this paper we describe how we have augmented
the Cloud Kotta secure data enclave to support collaborative and interactive
analysis of sensitive data. Our model uses Jupyter notebooks as a flexible
analysis environment and Python language constructs to support the execution of
arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing,
Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page
Workflows Community Summit:Bringing the Scientific Workflows Community Together.
Scientific workflows have been used almost universally across scientific
domains, and have underpinned some of the most significant discoveries of the
past several decades. Many of these workflows have high computational, storage,
and/or communication demands, and thus must execute on a wide range of
large-scale platforms, from large clouds to upcoming exascale high-performance
computing (HPC) platforms. These executions must be managed using some software
infrastructure. Due to the popularity of workflows, workflow management systems
(WMSs) have been developed to provide abstractions for creating and executing
workflows conveniently, efficiently, and portably. While these efforts are all
worthwhile, there are now hundreds of independent WMSs, many of which are
moribund. As a result, the WMS landscape is segmented and presents significant
barriers to entry due to the hundreds of seemingly comparable, yet
incompatible, systems that exist. As a result, many teams, small and large,
still elect to build their own custom workflow solution rather than adopt, or
build upon, existing WMSs. This current state of the WMS landscape negatively
impacts workflow users, developers, and researchers. The "Workflows Community
Summit" was held online on January 13, 2021. The overarching goal of the summit
was to develop a view of the state of the art and identify crucial research
challenges in the workflow community. Prior to the summit, a survey sent to
stakeholders in the workflow community (including both developers of WMSs and
users of workflows) helped to identify key challenges in this community that
were translated into 6 broad themes for the summit, each of them being the
object of a focused discussion led by a volunteer member of the community. This
report documents and organizes the wealth of information provided by the
participants before, during, and after the summit
DESC DC2 Data Release Note
In preparation for cosmological analyses of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), the LSST Dark Energy Science Collaboration (LSST DESC) has created a 300 deg simulated survey as part of an effort called Data Challenge 2 (DC2). The DC2 simulated sky survey, in six optical bands with observations following a reference LSST observing cadence, was processed with the LSST Science Pipelines (19.0.0). In this Note, we describe the public data release of the resulting object catalogs for the coadded images of five years of simulated observations along with associated truth catalogs. We include a brief description of the major features of the available data sets. To enable convenient access to the data products, we have developed a web portal connected to Globus data services. We describe how to access the data and provide example Jupyter Notebooks in Python to aid first interactions with the data. We welcome feedback and questions about the data release via a GitHub repository
The LSST DESC DC2 Simulated Sky Survey
International audienceWe describe the simulated sky survey underlying the second data challenge (DC2) carried out in preparation for analysis of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) by the LSST Dark Energy Science Collaboration (LSST DESC). Significant connections across multiple science domains will be a hallmark of LSST; the DC2 program represents a unique modeling effort that stresses this interconnectivity in a way that has not been attempted before. This effort encompasses a full end-to-end approach: starting from a large N-body simulation, through setting up LSST-like observations including realistic cadences, through image simulations, and finally processing with Rubinâs LSST Science Pipelines. This last step ensures that we generate data products resembling those to be delivered by the Rubin Observatory as closely as is currently possible. The simulated DC2 sky survey covers six optical bands in a wide-fast-deep area of approximately 300 deg2, as well as a deep drilling field of approximately 1 deg2. We simulate 5 yr of the planned 10 yr survey. The DC2 sky survey has multiple purposes. First, the LSST DESC working groups can use the data set to develop a range of DESC analysis pipelines to prepare for the advent of actual data. Second, it serves as a realistic test bed for the image processing software under development for LSST by the Rubin Observatory. In particular, simulated data provide a controlled way to investigate certain image-level systematic effects. Finally, the DC2 sky survey enables the exploration of new scientific ideas in both static and time domain cosmology